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Abstract 



Standard quantitative models of the stock market predict a log-normal distribu- 
tion for stock returns (Bachelier 1900, Osborne 1959), but it is recognised (Fama 
1965) that empirical data, in comparison with a Gaussian, exhibit leptokurtosis 
(it has more probabihty mass in its tails and centre) and fat tails (probabilities of 
extreme events are underestimated) . Different attempts to explain this departure 
from normality have coexisted. In particular, since one of the strong assumptions 
of the Gaussian model concerns the volatihty, considered finite and constant, the 
new models were built on a non finite (Mandelbrot 1963) or non constant (Cox, 
IngersoU and Ross 1985) volatihty. 

We investigate in this thesis a very recent model (Dragulescu et al. 2002) based 
on a Brownian motion process for the returns, and a stochastic mean-reverting 
process for the volatility. In this model, the forward Kolmogorov equation that 
governs the time evolution of returns is solved analytically. We test this new 
theory against different stock indexes (Dow Jones Industrial Average, Standard 
and Poor's and Footsie), over different periods (from 20 to 105 years). Our aim 
is to compare this model with the classical Gaussian and with a simple Neural 
Network, used as a benchmark. 

We perform the usual statistical tests on the kurtosis and tails of the ex- 
pected distributions, paying particular attention to the outliers. As claimed by 
the authors, the new model outperforms the Gaussian for any time lag, but is 
artificially too complex for medium and low frequencies, where the Gaussian is 
preferable. Moreover this model is still rejected for high frequencies, at a 0.05 
level of significance, due to the kurtosis, incorrectly handled. 
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Chapter 1 



Introduction 



1.1 A two century old paradigm 

Many attempts have been made, since the first agreement to trade on the NYSE 
in 1792, to model the stock market's behaviour. Understanding the patterns 
that govern this heart of the capitahsm is more than challenging, it is a crusade. 
But so far, can anybody claim to have found out the rules enabling them to 
predict tomorrow's move for instance? And do these rules even exist? Actually, 
the efficient markets theory states that market prices reflect the knowledge and 
expectations of all investors. As a consequence, this theory predicts, according 
to E. Fama |Fam65j . that the market would react quickly to such a discovery and 
these patterns would be modified instantly. Contrary to natural laws that govern 
physics, laws of the market adjust themselves to new discoveries. 

Anyhow, financial and scientific communities persist in building new mod- 
els, not only because we are eager to understand, but mainly because predicting 
tomorrow's move is not the only way to make money. Where the study of Fun- 
damentals or Technical Analysis are broadly used by traders on the market floor. 
Quantitative Finance tries to evaluate risk and hence price options using statis- 
tical models of the market. 

1.2 Aims 

We will present in this thesis the theory underlying most of these models, the 
Theory of Random Walks. Then we will discuss the different hypotheses that have 
been proposed, in the Bachelier-Osborne and Mandelbrot models, concerning 



11 



1.3. USUAL STOCK MARKET PREDICTION METHODS 



12 



a major parameter, the standard deviation. This parameter is crucial since it 
represents the volatihty of the market, and is used for instance in Value-at-Risk. 
After having confronted these models with empirical data, we will see that the 
structure of the market implies neither a constant standard deviation, as thought 
initially, nor an infinite standard deviation: the Probability Density Function 
(PDF) of the log-returns exhibits fat tails and kurtosis (peakedness or flatness 
compared to a Gaussian), with finite standard deviation. 

Then we will focus on one of these models, published very recently by A. A. 
Dragulescu and V. M. Yakovenko, from the University of Maryland. Their paper 
|DYj . "Probability distribution of returns with stochastic volatility" introduces 
a new model for volatility of stock market indexes. The proposed probability 
density function of stock returns seems to fit empirical data much better than 
previous models. We will double-check their results and propose a methodology 
to test their model against empirical data. 

But first, let's have a look at the different models of the market. Some are 
used by traders to try to predict tomorrow's move, others by derivative traders 
to price options or by risk managers to set the global policy of an Investment 
Bank, for instance. 

1.3 Usual Stock Market Prediction Methods 

On market floors, two radically different types of traders usually coexist: fun- 
damentalists and chartists. Fundamentalists believe that the stock price of a 
company reflects its intrinsic value. This intrinsic value depends on the present 
and forecast economic situation of the company (its "fundamentals"), and is 
mainly influenced by any new piece of information about these fundamentals. 
The problem is to evaluate this intrinsic value. 

On the other hand, chartists only analyse historical data (the "charts" ) of the 
stock, mainly the historical price, but other indicators as well, such as traded 
volumes, volatility, past resistance and support thresholds, etc. They try hard 
to find out hidden patterns that replicate over days, weeks, month or years, 
according to their speculative or investment needs. The assumption is that the 
market should have a short/long term memory, so we could use the past to predict 
the future. But here, the efficient market hypothesis asserts that all information 
which can be learned from technical analysis of stock prices is already reflected 
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in those prices. According to this hypothesis, past stock prices may be useful 
to estimate the parameters of the distribution of future returns, but they do not 
provide information which permits an investor to outperform the market. 



1.4 Quantitative Methods 

Broadly speaking, none of those stock traders (fundamentalists or chartists) daily 
use the quantitative models we are describing in this thesis. But these models 
are used on the floor by derivative traders and in risk divisions of investment 
banks to elaborate the global trading policy of the bank, the risk aversion, the 
over-night limits of individual traders, etc. 

Now the reader won't be surprised to learn that quantitative models are nei- 
ther fundamentalists nor chartists. They are much more deeply involved with 
maths. In fact, the theory underlying most of these models, called the "The- 
ory of Random Walks", claims that successive price changes {Pr — Pq) or price 
returns {Pr/Po) are independent, identically distributed random variables. This 
i.i.d. hypothesis has been studied by E. Fama in 1965 |Fam65j and is still dis- 



cussed today. We will discuss the independence of successive price changes in 
Chapter 2, but at the moment we will study different models that make this very 
strong assumption. 

Under this assumption of i.i.d. price returns, many models have been devel- 
oped, but two of them are used commonly nowadays. The first and most common 
one, called the Bachelier-Osborne model and elaborated in 1959, states that price 
returns have a constant finite volatility over a given period of time ("time lag r"), 
e.g. one day, one week, one month, etc. This theory results in a log-normal dis- 
tribution for price returns and a volatility proportional to the square root of the 
time lag, i.e. the weekly volatility will be about y/E times higher than the daily 
one. But it is now known that price returns do not follow a Gaussian distribution, 
since they exhibit kurtosis and fat tails: dramatic draw downs and spectacular 
jumps arise far more often than predicted by a Normal distribution. Hence, the 
idea of infinite volatility appeared. It was introduced by Mandelbrot in 1963 
and leads to stable Pareto-Levy distributions that can exhibit fat tails. Unfor- 
tunately, the hypothesis of infinite volatility supposes that the variance increases 
indefinitely with sample size, which is not verified by empirical data. Variance 
first increases then reaches a bound |('B00j . 
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1.5 Conclusion 

For centuries, practitioner's have tried to model and predict the financial mar- 
kets, using diverse techniques such as fundamental study or technical analysis. 
With the rapid growth of statistics and stochastic calculus fifty years ago, new- 
quantitative methods were born that seem to be able to handle the complexity 
of the stock market. 

The main model, Bacheher-Osborne, will be detailed in Chapter 2. We will 
see that it suffers mainly from two imperfections, high kurtosis and fat tails, that 
still remain to be explained. A. Dragulescu and V. Yakovenko published in March 
2002 an improvement of this model, based on a stochastic finite volatihty, that 
seems to fit the data perfectly. We will analyse this model in Chapter 3 and test 
it in Chapter 4. 

But first, let us present the basis of most of stock returns models, the Theory 
of Random Walks. 



Chapter 2 

The Theory of Random Walks 



2.1 Introduction 

The Theory of Random Walks has been used for the last 35 years by the main 
statistical models of the stock markets. It was first introduced by Bachelier 
in his 1900 dissertation written in Paris, "Theorie de la Speculation" (and in 
his subsequent work, esp. 1906, 1913), in which he anticipated much of what 
was to become standard fare in financial theory: the random walk of financial 
market prices, Brownian motion and martingales (all before both Einstein and 
Wiener!). His innovativeness, however, was not appreciated by his professors 
or contemporaries. His dissertation received poor marks from his teachers and, 
consequently blackballed, he quickly dropped into the shadows of the academic 
underground. After a series of minor posts, he ended up obscurely teaching in 
Besancon for much of the rest of his life. Virtually nothing else is known of this 
pioneer - his work being largely ignored until the 1960s when Osborne introduced 
his model based on Bachelier's work. 

A random walk is a random process consisting of a sequence of discrete steps 
of fixed length totally independent one from another. For instance, the random 
thermal perturbations in a liquid are responsible for a random walk phenomenon 
known as Brownian motion, and the collisions of molecules in a gas are a random 
walk responsible for diffusion. 

Applied to our problem, this theory is founded on two strong hypothesis: price 
returns are independent (tomorrow's price return does not depend on today's or 
on any other price return) and identically distributed (they all follow the same 
distribution). This is called the i.i.d. hypothesis. 
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Throughout this thesis, we will use these notations 




log return 



where log is the natural logarithm 

St is the close price of the security at time t 

St-T is the close price of the security at time t — t 

T is the time lag 

For instance, if the close price of the studied index is 106 today and was 105 
yesterday, the daily (time lag t— 1) values are 



By "day", we mean trading day, since all of our datasets are composed of 
trading days only: week-ends and bank holidays have been removed. By "time 
lag", we mean the number of days between two points used to compute a log 
return. If our initial dataset is composed of 1000 close prices, then for a time lag 
of five days, we will take one point every five to compute the log returns. As a 
consequence, our final dataset will will be composed of [^^] = 200 log returns 
only. Nevertheless, we can begin by the first, the second, the third, the fourth or 
the fifth close price, so that finally, we can use five different datasets of 200 log 
returns. This will allow us to give, for any computation, the average value and 
an estimate of the variance of the result, which will give greater robustness to 
our statistical tests. 

We will usually use the log return, mainly for two reasons: 

1. financially, it corresponds to the continuously compounded return of the 



2. numerically, it has the advantage of guaranteeing the positivity of the price. 

Obviously, any hypothesis about the independence and identical distribution 

of ])riee diaiiges is directly applicable to price returns and log returns. 

^If a is the continuously compounded return of an asset S, then the value St of S at time t 



price change price return log return 

106- 105 = 1 if = 1.009 /05(i§) =9.48*10-3 



asset S;-*^ 



is St = So e«* for t e [0,T] 
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According to the theory of random walks, the price change series is a collection 
of random variables having the property that, given the present, the future is 
conditionally independent of the past. In other words, 

P(q = c|q_i, Ct-2, •••) = P{ct = c) 

or, formulated directly in terms of price St at time t 

P{St = S\St^u 5i_2, ...) = P{St = S\St^,) 

To sum up, in such a process without memory, the last realisation contains 
all of the information. This process is known as a Markov process. 

Before we go more deeply in the mathematics of the theory of random walks, 
and study the main statistical model of stock market behaviour, the Bachelier- 
Osborne model, let us discuss the i.i.d. hypothesis itself. 

2.2 Independence of price returns 

As said before, all of the assumptions about price returns can be applied to price 
changes and log returns. Since we will not consider the mathematical aspect of the 
theory in this paragraph, we will prefer to use the price returns, simpler to tackle 
and often discussed in the financial press in terms of percentage of variation. 

The hypothesis of independent price returns is extremely important - and 
controversial - since it underlies all of the theory of random walks, and so all of 
the models developed around it. E. Fama discussed abundantly this hypothesis 
in his paper "The Behavior of Stock-Market Prices" |Fam65j and states that the 
independence of price returns is the result of a noisy price mechanism. By noise, 
one should understand the psychology of different traders and the uncertainty 
or disagreement about the intrinsic value of the security, which depends on new 
information arrived or about to arrive. If successive bits of new information arise 
independently across time and if noise or uncertainty concerning intrinsic value 
does not tend to follow any consistent pattern, then successive price returns in a 
common stock will be independent. 

A third and crucial condition for independence of price returns is the existence 
of "superior traders", viz. traders who will detect abnormalities on stock prices - 
departure of the security price from its intrinsic value - and will correct them by 
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buying (resp. selling) the security if it is underestimated (resp. overestimated). 
If there are enough such traders, then the price will tend to stabilise around its 
intrinsic value, reducing risks of bubbles or crashes. 

In the light of the recent scandals about the conflicts of interests of financial 
analysts working for the largest Investment Banks that participated in the cre- 
ation of the speculative bubble around the "new economy", it is legitimate to 
wonder if this last condition enunciated by Fama is still respected, and then if 
the hypothesis of independent price returns still holds. But this problem is out 
of the scope of this thesis, and from now on we will make the assumption that 
the hypothesis underlying the Random Walk are respected: price returns will be 
considered independent and identically distributed. 

Let us have a look now at the classical statistical model of the stock market, 
the Geometric Brownian Motion. 

2.3 Bachelier-Osborne Model 

The basic theory, known as the Bachelier-Osborne model, states that the stock 
index prices St follows a Geometric Brownian Motion (GBM). The description 
"Brownian motion" comes from the fact that the same process describes the 
physical motion of a particle subject to random shocks, a phenomenon first noted 
by the British physicist Brown in 1828, observing irregular movement of pollen 
suspended in water. The first mathematically rigorous construction of Brownian 
motion was carried out by Wiener in 1923. This theory is based on Markov 
Processes, Wiener processes and Ito processes, which are detailed in Appendix 
A. We summarise here the basic idea of the GBM. 

Let the price S = {St;t = 0, 1, T} be a non negative stochastic process. 
And the log return = log{^) 

The idea, first introduced by Bachelier IBacOOj . even if he used a Brownian 
motion and not a geometric Brownian motion, is that for a given time lag t, the 
log return Vt is the sum of a large number of i.i.d. random variables Arj 

n 

T=l 
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Then if we assume that the distribution followed by the Ar^ has finite mo- 
ments, and specially finite variance, then the Central Limit Theorem states di- 
rectly that Tt must follow a Normal distribution. Osborne formalised this in 1959 
using the following equation, called Geometric Brownian Motion, for the stock 
price St 

dSt = uStdt + aStdWt (2.1) 

where /i and a are two constants called the drift and the volatility, and W is a 
standard Wiener process.^ 

We will demonstrate that LogSt obeys a simple Brownian motion with an 
instantaneous expectation /i — and an instantaneous volatility cr^. 

We start from I2.ll and apply Ito's lemma to the following function 




^ Log{S{t)) = LogSt 



We obtain 



Besides 



,r n, S Log St , 5 Log St 1 5'^LogSt , 
dLogSt = -^dt + ^^dS + - {dS) 

dSt l{dStf . 6LogX 1 6^LogX 1 



dSt 1 {dS, 



St 2 S? 



(2.2) 



(^)2 = fi\dtf + a\dWtf + 2fi(TdtdWt 
St 

= a^dWtf + 0{dt)^ since {dWt)^ = 0{dt) 

= a^dt + 0{dt)^ 

= a^dt + o{dt) (2.3) 

^See Appendix A. The increments of W, dWt, are normally distributed with i?[c?VFt] = 
and Var[dWt] = dt 
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where q(t) = Oif') ^ lim — - = 1 
^ ^ ^ t^o q{t) 

and g(t) = o(r)^lim— = 
From 12111221 and ESI we deduce 



dSt 1 (rf^i 



dLogSt - ^ o r.2 



= /icit + (Jfiiyt - ^cr^c/if: + o{dt) 

2 

= (/i - — )cit + adWt + o(rft) (2.4) 
2 

From|2Ill we see that dLogSt follows a simple Brownian motion. If = Log^, 
then rt is governed by the following equation: 

drt = {n )dt + adWt (2.5) 

Log returns follow a simple Brownian motion, and are then normally dis- 
tributed. Indeed, 12.51 admits the solution 

2 

n = (/i - y )t + aWt (2.6) 

Or formulated differently 

= 5oe(^~'^)*+'^^* (2.7) 

This model is known as the Bachelier-Osborne model, and predicts a log- 
normal distribution for the price Sf. 



2.4 Departure from normality 

According to the GBM model, stock prices should be log-normally distributed, 
i.e. stock log-returns should be normally distributed: 

2 

= (/i - y )t + aWt 

Nevertheless, it is now well known that log returns exhibit two specific kinds 
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of departure from a Gaussian: kurtosis and fat tails. A high kurtosis means that 
the model distribution is more peaked than a Gaussian around the mean. Fat 
tails mean that crashes and huge increases appear far more often than predicted 
by the normal law. Let us consider the probability density function (PDF) of log 
returns for the different datasets we have: 

• DJIA1982: Dow Jones Industrial Average from January 04, 1982 to De- 
cember 31, 2001 

• DJIA1988: Dow Jones Industrial Average from January 04, 1988 to De- 
cember 31, 2001 

• DJIA1930: Dow Jones Industrial Average from January 02, 1930 to De- 
cember 31, 2001 

• DJIA1896: Dow Jones Industrial Average from May 26, 1896 to December 
31, 2001 

• SP1965: Standard and Poor's 500 from January 04, 1965 to December 31, 
2001 

• FTSE1984: FTSEIOO from January 04, 1984 to December 31, 2001 

We will perform a few tests to exhibit more precisely the kurtosis and fat 
tails. These tests will be reproduced later on Dragulescu's model. Before we 
perform our tests, let us have a look at a few examples of PDFs: we plot the 
PDFs for different time lags {r — 1, 5, 20 and 250 trading days) against a Normal 
distribution based on the sample mean and variance. 

The issue is whether the Normal distribution fits empirical data sufficiently 
well, or whether the model should be rejected. 

2.4.1 Measure of kurtosis: Jarque-Bera Test 

If log returns really follow a Normal distribution, then we are expecting a null 
value for the Fisher kurtosis.^ Kurtosis is a measure of whether the data are 
peaked or fiat relative to a normal distribution. That is, data sets with high 
kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and 
have heavy tails. We compute the kurtosis for each dataset, and for different 
^See Appendix B, Section "Kurtosis" 
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Figure 2.1: First look at the fat tails 

time lags. As explained above, we can give an estimate of the standard deviation 
(indicated into parenthesis) of the kurtosis for time lags superior to 1, because 
we compute the kurtosis on many different datasets, or "paths". Are results are 
presented in Tables 12.11 12.21 and 12.31 



time lag 


DJIA1982 


DJIA1988 


1 


69.27 


5.963 


5 


16.87 ±16.2 


3.07 ±0.91 


20 


7.75 ±4.77 


1.43 ±0.93 


40 


5.69 ±2.40 


1.18 ±0.74 


80 


2.75 ±1.21 


0.86 ±1.45 


100 


1.68 ±0.99 


0.25 ±0.88 


200 


-0.06 ±0.57 


-0.59 ±0.37 


250 


-0.52 ±0.37 


-0.55 ±0.58 



Table 2.1: Measure of kurtosis for DJIA1982 & DJIA1988 

We can clearly see that for every dataset, empirical log returns exhibit high 
kurtosis for high frequencies (time lag = 1 and 5 days) and medium frequencies 
(time lag = 20, 40 and 80 days), and small (for the largest datasets only, DJIA1896 
and DJIA1930) or no kurtosis for low frequencies (time lag = 200 and 250 days). 
But the standard deviation is very high compared with the mean value, which 
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time lag 


DJIA1896 


DJIA 1930 


1 


26.81 


27.38 


5 


12.55 ±2.36 


9.60 ±3.82 


20 


8.26 ±1.60 


8.40 ±1.83 


40 


5.85 ±1.29 


7.44 ±2.05 


80 


3.51 ±1.30 


5.43 ±2.25 


100 


2.65 ±0.90 


4.29 ±1.68 


200 


3.16 ±2.49 


4.99 ±3.09 


250 


2.65 ±2.14 


4.39 ±2.77 



Table 2.2: Measure of kurtosis for DJIA1896 & DJIA 1930 



time lag 


SP1965 


FTSE1984 


1 


42.08 


12.72 


5 


9.26 ±8.46 


12.34 ±5.46 


20 


3.90 ±2.28 


10.67 ±5.99 


40 


2.95 ±1.26 


5.48 ±2.44 


80 


1.18 ±0.85 


2.70 ±2.08 


100 


0.67 ±0.62 


2.10 ±1.70 


200 


0.07 ±0.52 


0.14 ±0.86 


250 


0.01 ±0.70 


-0.04 ±0.74 



Table 2.3: Measure of kurtosis for SP1965 & FTSE1984 
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means that some paths exhibit very high kurtosis whereas others exhibit very 
httle. For instance, let us have a look at the five paths of the 5 days time lag 
returns of the DJIA1982. Results are shown in Table EIH 



Table 2.4: Example of kurtosis for each path, on DJIA1982, with r = 5 

The kurtosis goes from 3.99 to 38.20, with a mean and standard deviation 
of 16.87 and 16.20 respectively. Only two paths out of five exhibit very high 
kurtosis, which is enough to have a high mean, but the large standard deviation 
must remind us of the important heterogeneity of the different paths. 

On average, the probability mass of empirical log returns is leptokurtic^ for 
high frequencies. This departure from normality should enable us to reject the 
normal hypothesis. 

To verify, we perform a Jarque-Bera test, which tests the goodness-of-fit to a 
normal distribution, according to the skewness and kurtosis.^ It tests a composite 
hypothesis, which means that the parameters of the tested distribution, viz. the 
mean and variance of the normal distribution, can be derived from the empirical 
data, and do not need to be known in advance. For each path, the output of 
the test is if we do not reject the null hypothesis (viz the normal hypothesis) 
at a significance level a = 0.05, and 1 if we reject it. We give in Table the 
average of the tests. For instance, for a time lag of 80 days, we perform the 
Jarque-Bera test 80 times, on 80 different log returns datasets. An average value 
of 0.9 means that the test rejected the null hypothesis 90% of the time, i.e. 72 
out of 80 datasets. 

For each dataset, for high frequencies, the normal hypothesis is systemati- 
cally rejected. For low frequencies, except for the largest datasets (DJIA1896 
and DJIA1930), which exhibit small kurtosis, the normal distribution cannot be 
rejected. The conclusion is not straightforward for middle frequencies. 

^See appendix B, Section "Measures of kurtosis" 

^See Appendix B, Section "Jarque-Bera Goodness-of-Fit Test" 



path 



kurtosis 



1 
2 
3 
4 
5 



38.20 
30.49 
5.91 
5.77 
3.99 
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time lag 
1 
5 
20 
40 
80 
100 
200 
250 

Table 2.5: 
rejects Hq 



DJ1A1988 

1 

1 

0.65 

0.625 

0.2375 

0.08 







Proportion 



DJ1A1982 

1 

1 

1 

1 

0.9 
0.6 





DJIA1930 

1 

1 

1 

1 

1 

1 

1 

1 



DJIA 1892 

1 

1 

1 

1 

1 

1 

0.97 
0.964 



SP1965 

1 

1 

1 

1 

0.5375 
0.23 
0.01 
0.064 



FTSE1984 

1 

1 

1 

0.975 

0.8 

0.54 







of paths for which the Jarque-Bera Goodness-of-Fit Test 



2.4.2 Fat tails: Normal Probability Plot and Lilliefors 
Test 

The other important departure from Normality consist of fat tails, that could be 
exhibited by performing a probability plot. This time, we do not perform a test 
on each dataset and each time lag, since a few examples should be enough. We 
select the first dataset, the Dow Jones Industrial Average, from January 04, 1982, 
to December 31, 2001 and draw the Normal Probability Plot on the log returns. 
The results are presented in Figure IT^ 

We can make the following conclusions from the above plot: 

1. The normal probability plot shows a non- linear pattern; 

2. The normal distribution is not a good model for these data. 

For data with short (less variance than expected in a normal distribution) 
or long (more variance than expected in a normal distribution) tails relative to 
the normal distribution, the non-linearity of the normal probability plot shows 
up in two ways. First, the middle of the data shows an S-like pattern. This is 
common for both short and long tails. Second, the first few and the last few 
points show a marked departure from the reference fitted line. For short tails, 
the first few points show increasing departure from the fitted line above the line 
and last few points show increasing departure from the fitted line below the line 
(r = 1, 5 and 20 days). For long tails, this pattern is reversed (r = 250 days). 

In this case, we can reasonably conclude that the normal distribution does 
not provide an adequate fit for this dataset, for high frequencies. To confirm this, 
we perform a Lilliefors Goodness-of-Fit Test. 
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Figure 2.2: Normal Probability Plot for r = 1, 5, 20 and 250 days 

Lilliefors tests the goodness of fit to a normal distribution. It is derivated 
from the Kolmogorov-Smirnof test, with the difference that it tests a composite 
hypothesis and not a simple hypothesis.^ The difference with the Jarque-Bera test 
is that this one is based on the maximum departure of the empirical distribution 
from the normal distribution, so this test will tend to reject the null hypothesis 
in the presence of kurtosis and fat tails. We perform this test for each index and 
each dataset. For each path, the output of the test is if we do not reject the 
null hypothesis (viz the normal hypothesis) at a significance level a = 0.05, and 
1 if we reject it. We give in Table 12.61 the average of the tests. For instance, 
for a time lag of 80 days, we perform the Lilliefors test 80 times, on 80 different 
log returns datasets. An average value of 0.1875 means that we rejected the null 
hypothesis 0.1875 * 80 = 15 times out of 80. 

The Lilliefors test rejects the normal hypothesis for high frequencies, but not 
for low frequencies. Again, for large datasets, the normal hypothesis is more 
often rejected, even for low frequencies. We believe rejection comes from the fact 
that kurtosis and fat tails are due to outliers, events that are expected to happen 
once in a century by the Bachelier-Osborne model, but that occur far more often. 
Even if these events happen more often, they are still rare enough to be absent 
^See Appendix B, Section "Lilliefors Goodness-of-Fit Test" 
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time lag 


DJIA1988 


DJIA1982 


DJIA1930 


DJIA1896 


SP1965 


FTSE1984 


1 


1 


1 


1 


1 


1 


1 


5 


1 


1 


1 


1 


1 


1 


20 


0.4 


0.85 


1 


1 


0.8 


0.65 


40 


0.075 


0.725 


1 


1 


0.675 


0.425 


80 


0.025 


0.1875 


1 


1 


0.0625 


0.1625 


100 





0.06 


1 


1 


0.03 


0.08 


200 


0.035 


0.04 


0.61 


0.41 


0.1 


0.035 


250 


0.02 


0.024 


0.532 


0.376 


0.064 


0.112 



Table 2.6: Proportion of paths for which the Lilliefors Goodness-of-Fit Test rejects 

from some too small datasets, specially for low frequencies where the number of 
points in each dataset is very low. This issue is not investigated in this thesis, 
and remains to be resolved. 

2.5 Conclusion 

We have described in this chapter the theory underlying most of statistical models 
of the stock market, the Random Walk Theory. The first model to use this theory 
was Bachelier-Osborne model (1959), that predicts a normal distribution for log 
returns. Even though this model remains widely used, specially by Black and 
Scholes in their famous model for option pricing, the empirical data show a clear 
departure from normality for high frequencies (r = 1 and 5 days): the observed 
distribution is leptokurtic and exhibits fat tails. For low frequencies (r = 200 
and 250 days), the normal hypothesis cannot be rejected. The conclusion is not 
straightforward for medium frequencies (r = 20, 40, 80 and 100 days). 

Since 1959, some attempts have been done to produce a better model for 
log returns (stable Pareto-Levy distributions |Fam65j . exponentially truncated 
power law, etc.), a model that would particularly fit the kurtosis and fat tails of 
the empirical distribution. But so far, all of them suffered from strong criticisms. 
We investigate in next chapter a recent model, proposed by Dragulescu et al. in 
2002, based on a stochastic mean-reverting process for the volatility. 



Chapter 3 
Dragulescu's model 



3.1 Introduction 

Mainly because of kurtosis and fat tails, we have to figure out better models for 
stock market returns than the Gaussian. Above all, the normal distribution fails 
to describe the most important phenomena: draw downs and bubbles, that occur 
far more often than expected, as shown by the fat tails. 

Hence, we have to review some hypothesis that result in the classic Gaussian 
model, in order to improve upon it. Many models exist that try to explain or 
produce fat tails. The main assumptions of the Gaussian model concern (1) the 
independence of log-returns and (2) the finite constant volatility. If we assume 
that log-returns are really independent and identically distributed, then only a 
non constant or non finite volatility could explain the departure from normality 
observed in the empirical data. 

In the Gaussian hypothesis, the instantaneous volatility takes the form of 
aSti where a is a finite constant and St is the security price at time t. Some 
models, e.g. stable Paretian distributions ( |Fam65j ). consider the volatility as 
infinite and produce fat tails as in empirical data. Nevertheless, the assumption 
of infinite volatility does not appear to be relevant, since the volatility does not 
grow indefinitely with the sample size. Another innovation is to consider a finite 
stochastic volatility. This class of models has been introduced by Hull and White. 
We are studying here one of these models, proposed by A. Dragulescu and A. 
Yakovenko. We will reproduce their results following their methodology, make a 
few critical comments on the way they trim the data, and propose a methodology 
to test their model against the empirical distribution of log returns. 
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3.2 Mean- reverting stochastic volatility 

This model starts from a geometric Brownian motion stochastic differential equa- 
tion for the price St 

dSt = ijStdt + atStdWt^^^ (3.1) 



where W}; is a standard Wiener 



lO Cl OtClliUCLil- 



Log returns = Log^ and centred log returns Xt = rt — fit are introduced. 



Prom 1231 we get 



and 



dn = (/i - ^)dt + y/u[dW^^^ since at = y/tk (3.2) 



dxt = -'^dt + y/vtdwl^^ (3.3) 



Then instead of having a constant volatility at = a as in the Bachelier- 
Osborne model, the authors assume the variance Vt = cr^ obeys the following 
mean-reverting stochastic differential equation 

dvt = -7(wt - e)dt + k^/vtdWl^^ (3.4) 

where Vt = cr^ 

9 is the long time mean of Vt 

7 is the rate of relaxation to this mean 

is a constant parameter called the variance noise 

dWt is another standard Wiener process, 

not necessarily correlated with dWt^^ 
This model for the variance has been proposed first by Cox, IngersoU and 

Ross |CIR85j in an attempt to price options, known as the CIR model. 



3.3 Forward Kolmogorov 

The authors solve the forward Kolmogorov (also called Fokker-Planck) equation 
that governs the time evolution of the joint probability Pt{x,v\vi) of having the 
log return x and the variance v for the time lag t, given the initial value Vi of the 
variance 
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They introduce a Fourier transform to solve analytically this equation, and 
obtain the following expression for the probability distribution of centred log- 
returns X for a time lag t: 

1 r+oo 

Pt{x) = — y__^ t/p.e^^'^+^'^P^) (3.6) 

with 



(3.7) 

where F = 7 + ipkp^ 

p is the correlation coefficient between the two Wieners w}^^ and wj;'^^ 



7, ^, /c and /i are the parameters of the model 
Eqn. I3.6l is the central result of their model. It gives, for a given time lag t, the 

expected probability density of centred log returns x. An asymptotic analysis^ 
of Pt{x) shows that it predicts a Gaussian distribution for small values of x, and 
exponential, time dependent tails for large values of 

To confront their model with observed log returns, they train the four param- 
eters of the model, 7, 6, k and /i, to fit the empirical index (DOW JONES from 
January 04, 1982 to December 31, 2002) by minimising the following square- mean 
deviation error 

E = J2\logP:ix)-logPt{x)\ 

x,t 

for all available values of log returns x, and time lag t = 1,5, 20, 40 and 250 days, 
where Pt{x) is the empirical probability mass 



In their model, the authors set the correlation coefficient p to zero, since (i) 
iSee P3 Part VI 
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Figure 3.1: Dragulescu vs Normal vs Empirical data 

their trained parameter p^rained is almost null [p^rained ~ 0) and (ii) they do not 
observe any difference, in the fitting of empirical data, between taking p^rained 
or p = 0. Hence, they reduce the complexity of their model. 

Minimising the deviation of the log instead of the absolute difference \P^{x) — 
Pt{x)\ forces the parameters to fit the fat tails instead of the middle of the dis- 
tribution, where the probability mass is very high. 

The results are shown in figure IHTTl 

Apparently, their model (plain line) fits the empirical data (dots) far better 
than the Gaussian (dash line), specially if we look at the fat tails. 

3.4 Conclusion 

In his attempt to improve the classical Bachelier-Osborne model, which does not 
handle the kurtosis and the fat tails of the empirical probability mass, Dragulescu 
and Yakovenko started from a geometrical Brownian motion for the stock price 
St, assumed a mean- reverting stochastic process for the variance Vt, and solved 
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analytically the forward Kolmogorov equation that governs the joint probability 
of this two-dimensional stochastic process. Then, by integrating over the variance, 
they derived the probability density Pt{x) of log returns x for a given time lag t. 

Once the four parameters of the model are trained, the resulting distribution 
seems to fit the empirical data far better than the Normal. We explain in Chapter 
4 the methodology we used to obtain these results. Then we replicate these 
results on different datasets (different indexes) and perform some statistical tests 
to measure the goodness-of-fit of this model. 



Chapter 4 
Experiments 



4.1 Introduction 

Our aim is to replicate Dragulescu and Yakovenko results and to see if they 
are reproducible on other datasets, with different time periods and/or different 
indexes. The model is supposed to fit any stock index, provided we set the 
value of the four parameters correctly. We will test their model itself and some 
assumptions such as the ergodicity of the dataset. First, we use exactly the same 
methodology as described in the paper, and see that strange points appear on 
our results. By clarifying the origin of these points directly with Dragulescu and 
Yakovenko, we make a few critical comments concerning the way they reuse and 
trim the data, and propose our own methodology based on the conservation of all 
the data (specially outliers, that occur during crashes and form fat tails). As a 
benchmark, we test this model against the classical Gaussian model and against 
a simple Neural Network. 

4.2 Datasets 

Our datasets will be, first, the Dow Jones Industrial Average (DJIA) for different 
time periods: the period used in Dragulescu and Yakovenko paper (from January 
04, 1982 to December 31, 2001), the period after the 1987 crash (from January 
04, 1988 to December 31, 2001), after the 1929 crash (from January 02, 1930 to 
December 31, 2001) and finally since 1896 (from May 26, 1896 to December 31. 
2001) to get the largest dataset possible. Indeed, we need a very large dataset 
to compute distributions for important time lags such as 250 days (we take one 
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point out of 250). Thanks to these datasets, we will test the robustness of our 
potential patterns according to different periods. In particular, we will focus on 
the impact the presence of a crash in the dataset can have on the model. 

Moreover, we will use other indexes to test these patterns against other mar- 
kets: the Standard and Poor's 500 from January 04, 1965 to December 31, 2001, 
and the FTSEIOO from January 02, 1984, to December 31, 2001. 

We download the data from YAHOO (|YE|) and EC0N0MY.COM ([E^) 
for the Dow Jones and Standard and Poor's 500 and from Datastream for the 
FTSEIOO. 

By "day", we mean trading day, since all of our datasets are composed of 
trading days only: week-ends and bank holidays have been removed. By "time 
lag" , we mean the number of trading days between two points used to compute a 
log return. If our initial dataset is composed of 1000 close prices, then for a time 
lag of five days, we will take one point every five to compute the log returns. As 
a consequence, our final dataset will will be composed of [^^] = 200 log returns 
only. Nevertheless, we can begin by the first, the second, the third, the fourth 
or the fifth close price, so that, finally, we use five different datasets of 200 log 
returns. This allows us to give, for any computation, the average value and an 
estimate of the variance of the result, which will give more robustness to our 
statistical tests. 

4.3 Methodology 

We first describe and follow strictly the methodology proposed in their paper 
by Dragulescu et al., in order to reproduce their results. This methodology suf- 
fers from imperfections, specially because (i) they re-use the data and (ii) they 
trim the data during the pre-processing step. This leads us to propose our own 
methodology. 

4.3.1 Reusing the data 
Introduction 

For a given index / at a given period, let us say the Dow Jones Industrial Average 
from January 04, 1982, to December 31, 2001, and a given time lag r, let us say 
r = 5 days, the raw close price dataset dosePrice is composed of n close prices, 
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here n — 5050. When Dragulescu and Yakovenko compute the log returns dataset 
logReturns starting from dosePrice, they obtain the following time series: 

log Returns — {rt\t & [1, n — t]} 

where = log^^, e [1, n — r] 
In our example, we would have 

logReturns — {ri, r2, r„_T-} 

(1 Pl+T J P2+T , Pn 
= {log— log— log- — } 

-T 1 -'2 -T n-T 

n Pe 1 P7 , -PsOSOl 

= {log—,log—,...Jog- — } 

-Tl -^2 -^5045 

Thus, they obtain a single dataset of n — r log returns. We believe this 
way of computing the log returns time series is unfair, because it "re-uses" the 
data. Indeed, let us assume that a crash occurs at time t*. Then they will take 
into account this specific event r times exactly in their dataset in log returns 
{rt*^r,rt*^r+i, ...,rf*_i}. 

This way to derive, from a raw close price time series of n points, a single log 
returns time series of n — r points, is strictly equivalent, in terms of shape pa- 
rameters of the final distribution (sample mean and sample standard deviation 
a), to deriving m = [^] ^ log returns time series composed of m log returns, and 
averaging them into a single time series log Return' . 

\/k e [1 m], log Returns' {k) — — log Returns j{k) 

with 

Vj e [1 m], logReturns j — {rl\t e [1 m]} 

where r| = Zo^%^^^i^, e [1 m] 

To put it in a nutshell, we derive m log returns time series logReturnSj of 
cardinality m = [^], called "paths", instead of a single log return time series 
logReturn of cardinality n — T. Then we average these m paths to obtain a final 
log return time series logReturns' , of cardinality m. Obviously, we have: 
^where [A] denotes the nearest integer less than or equal to A 
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• E[log Returns] = E[log Returns'] 

• Var[log Returns] = Var[log Returns'] 

The fact that Dragulescu and Yakovenko re-use the data is then justifiable 
only if all of the paths are equivalent, viz. only if we assume that the system is 
ergodic.^ 

First test of ergodicity 

There is a simple way to test the ergodicity of the dataset: for each time lag, 
we compute the sample mean /i-^ and the sample standard deviation of each 
path j. If the system is really ergodic, then we expect these shape parameters 
to be almost constant from one path to the other. In other words, their variance 
should be almost null. To compare things that are comparable, instead of giving 
the variance alone, we give the standard deviation (square root of the variance) 
of the parameter divided by its mean, which gives us a "variation rate" . Results 
are presented in Tables l^?T] and for DJIA1982 and DJIA1896 respectively. 

time lag % variation on yU % variation on cr 



5 0.58 5.22 

20 0.76 4.31 

40 1.73 7.80 

80 2.22 6.48 

100 3.43 9.30 

200 4.29 9.38 

250 5.44 11.5 



Table 4.1: Variation of shape parameters and a over different paths, DJIA1982 

There is no variance, and then no variation, for a time lag of one day, since 
we have only one single log returns time series. 

The variation rate is low for very large datasets like the Dow Jones Industrial 

Average from 1896 to 2001; it is always inferior to 5.5 %. However, for relatively 

^ "A collection of systems forms an ergodic ensemble if the modes of behaviour found in 
any one system from time to time resemble its behaviour at other temporal periods and if 
the behaviour of any other system when chosen at random also is like the one system. We 
do not require identical performance, only quite similar time averages and number averages. 
(If you cannot tell one youth from another or one adult from another, they belong to an 
ergodic ensemble.) In an ergodic population, any single individual is representative of the 
entire population. The salient characteristics of this individual are essentially identical with 
any other member of the group" , |MW| 
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time lag % variation on n % variation on a 



5 0.12 1.46 

20 0.53 2.26 

40 1.84 2.46 

80 1.52 5.15 

100 2.43 2.82 

200 2.23 2.33 

250 2.18 4.84 



Table 4.2: Variation of shape parameters fi and a over different paths, DJIA1896 

small datasets (DJIA 1982, DJ1A1988), the variation rate can reach 11.5 %, 
and even 17 % for DJIA1988. This comes from the fact the distribution of an 
average tends to be normal (CLT^) when the sample size increases, with variance 
decreasing proportionally to ^/n. The more points we have, the less the variation 
rate, whatever the initial distribution. 

Second test of ergodicity 

Given that this test is not conclusive, we perform a Kruskal-Wallis Test, which 

is a nonparametric version of the One- Way Analysis of Variance ("ANOVA").^ 

The purpose of a one-way analysis of variance is to find out whether data from 

several datasets have a common mean. The assumption behind this test is that 

the measurements come from a continuous distribution, but not necessarily a 

normal distribution.^ If the p-value is near zero, this casts doubt on the null 

hypothesis and suggests that at least one sample mean is significantly different 

than the other sample means. 

The results, presented in Tables 14.31 and 14.41 clearly demonstrate that there 

is no significant difference between the means of all the different paths, whatever 

the frequency. Indeed, the p-value is always very high. This tends to support 

the hypothesis that all the paths are equivalent. But if we have a look at the 

variance now (which is the core of Dragulescu and Yakovenko model), we see that 

it changes dramatically from a path to another. To show this, we plot in Figure 

14.11 the Box Plot of the different paths.® 

■^See Appendix B, Section "Central Limit Theorem" 
"Se Appendix B, Section "ANOVA Test" 
^Se Appendix B, Section "Kruskal-Wallis Test" 
^Box Plots are composed of different elements: 
• The lower and upper lines of the "box" are the 25th and 75th percentiles of the sample 
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time lag 


Chi- Square 


df 


p- value 


5 


0.33 


4 


0.9875 


20 


0.37 


19 


1 


40 


1.35 


39 


1 


80 


2.39 


79 


1 


100 


4.49 


99 


1 


200 


12.53 


199 


1 


250 


9.61 


249 


1 



Table 4.3: Kruskal-Wallis Test on means /x, DJIA1982 



time lag Chi-Square df p-value 

5 0.07 4 0.9995 

20 0.80 19 1 

40 0.80 39 1 

80 1.30 79 1 

100 3.37 99 1 

200 2.42 199 1 

250 11.79 249 1 



Table 4.4: Kruskal-Wallis Test on means /i, DJIA1896 




Figure 4.1: Kruska-Wallis Test on standard deviation a, DJIA1982 
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We can clearly see that the different paths are not equivalent, even if they 
almost have the same mean. The number of outliers for instance vary dramati- 
cally from one path to another, as indicated by the number of red crosses outside 
the "whiskers" . We have to indicate that this variance in the volatility of the log 
returns has no specific relation with the well documented "seasonal effect" , since 
our paths are not based on calendar days but on trading days. For instance, the 
five paths obtained for a 5-days time lag are not composed of log returns from 
Mondays to Mondays, Tuesdays to Tuesdays, etc., but on consecutive trading 
days. Nevertheless, in the financial literature, analysts usually use realisations 
(called "paths" in this thesis) that do not reuse the data, because they are in- 
terested in investment strategies on a daily, weekly, monthly, etc., basis. They 
do not average the paths to get a final very large dataset, as Dragulescu and 
Yakovenko did. 

Conclusion 

For technical reasons (paths are different from each other, the system is not 
ergodic) and practical reasons (financial analysts do not do that), we think Drag- 
ulescu and Yakovenko should not reuse the data, which is equivalent to averaging 
the mean and the variance of each data. As a consequence, in our subsequent 
tests, we will present our results without reusing the data. 

4.3.2 Pre-processing the data 

We perform all of our tests for different time lags: 1, 5, 20, 40, 80, 100, 200, and 
250 days. For a given time lag r and a given dataset D, we compute all of the 
log-return series rt — Log{-^^). If the price dataset contains n points (each point 

• The distance between the top and bottom of the box is the interquartile range 

• The line in the middle of the box is the sample median. If the median is not centred in 
the box, that is an indication of skewness 

• The "whiskers" are lines extending above and below the box. They show the extent of 
the rest of the sample (unless there are outliers). Assuming no outliers, the maximum of 
the sample is the top of the upper whisker. The minimum of the sample is the bottom 

of the lower whisker. By default, an outlier is a value that is more than 1.5 times the 
interquartile range away from the top or bottom of the box 

• The plus sign at the top of the plot is an indication of an outlier in the data. This point 
may be the result of a data entry error, a poor measurement or a change in the system 
that generated the data 
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is the daily close price of the index considered), then we obtain n — r log- returns. 
Dragulescu and Yakovenko trimmed the log returns, rejecting any value out of 
the boundaries presented in Table HTHl ^ 

time lag trimming boundaries 



1 


[-0.04 0.04] 


5 


[-0.08 0.08] 


20 


[-0.13 0.15] 


40 


[-0.17 0.20] 


80 


[-0.18 0.25] 


100 


[-0.20 0.28] 


200 


[-0.22 0.38] 


250 


[-0.22 0.44] 



Table 4.5: Boundaries used by Dragulescu et al. to trim empirical log returns 

We visualise in Figure 14.21 the effect of trimming the data: all of the log 
returns outside the boundaries, represented here by the two horizontal lines, were 
trimmed by the authors. 

DJIA1982, time lag=1 

0.1 I 1 1 1 1 1 1 




2 -0.1 - 



-0.15- 
-0.2 - 
-0.25 - 

-0.3' ' ' ' ' ' 

1000 2000 3000 4000 5000 6000 

Trading day t 



Figure 4.2: Trimmed log returns, DJIA1982, r = 1 day 

^This step is not mentioned in Dragulescu's paper. Before applying this trimming method, 
strange points used to appear in our results. Then we contacted the authors who informed us 
of trimming of the log returns, using those boundaries for the Dow Jones Industrial Average 
from January 04, 1982, to December 31, 2001 ("DJIA1982"). 
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We believe this way of trimming the data is unfair, because it removes in- 
formation from the dataset. Given that the model is supposed to outperform 
the Bachelier-Osborne model, and specially to fit the kurtosis and the fat tails, 
removing extreme values (that belong to the fat tails, and produce kurtosis) is 
counter-productive. Even the normal distribution could fit the data quite well in 
these conditions. To prove this, we perform the Jarque-Bera and Lillifors tests 
against the normal distribution, but using the trimmed data. 

Results in Tables 14.61 and 14.71 compared with the same test on untrimmed 
data (Tables 12.51 and 12. 6|) clearly show that trimming the data, as Dragulescu 
and Yakovenko do, rejects the normal hypothesis only for higher frequencies. 
This time the tests do not reject the normal hypothesis for medium frequencies 
as they did without trimming. 

For this reason, we decided to perform our statistical tests without trimming 
the data. 

4.3.3 Distributions 

To obtain the empirical distribution, we partition the log-returns into equal sized 
bins of length A,.( A,. = Then we count the number of log-returns 

per bin, called occupation number and remove the bins for which occupation 
number is lower than a critical value of five. We initially choose the number of 
bins so that we globally remove less than one percent of the log-returns. This 
filtering technique is supposed to get rid of the outliers, viz. infrequent events. 
Thus, we obtain the frequency repartition of log-returns, also called in this paper 
the empirical probability density function, empPDF. 

In order to exhibit fat tails and kurtosis, we fit a Gaussian to the empPDF, 
by estimating the sample mean and sample standard deviation of the set of log- 
returns rt. We obtain the sample mean /i and the sample standard deviation a 
and plot the Gaussian normPDF. 

After having observed the departure from normality, we build the new model. 
We train the four parameters of Dragulescu and Yakovenko model by minimising 
the mean-square deviation E = J2x,t \^ogP^{x) — logPt{x)\, and compute the PDF, 
draguPDF for different time lags. 

Finally, we build and train a Neural Network to fit empPDF as precisely as 
possible. A very simple structure is enough for this first approach. We will use 
this NN, nnPDF, later in our tests ClS db benchmark. 
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lag 


DJIA1982 


DJIA1988 


DJIA1896 


DJIA 1930 


SP1965 


1 


1 


1 


1 


1 


1 


5 


1 


1 


1 


1 


1 


20 





0.1 


1 


1 


0.5 


40 


0.175 


0.375 


0.225 


0.575 


0.175 


80 





0.0125 





0.025 





100 





0.05 











200 

















250 


















1 

0.6 
0.1 









Table 4.6: Jarque-Bera Goodness-of-Fit Test to the Normal distribution, after 
trimming, DJIA1982 



time lag 


DJIA1982 


DJIA1988 


DJIA1896 


DJIA 1930 


SP1965 


FTSE1984 


1 


1 


1 


1 


1 


1 


1 


5 


1 


1 


1 


1 


1 


0.8 


20 


0.1 


0.35 


1 


0.9 


0.35 


0.1 


40 


0.05 


0.025 


0.3 


0.525 


0.125 


0.05 


80 


0.0375 


0.025 


0.075 


0.1375 








100 








0.09 


0.04 


0.03 


0.01 


200 


0.025 


0.035 


0.01 





0.065 


0.04 


250 


0.032 


0.02 


0.016 





0.02 


0.108 



Table 4.7: Lilliefors Goodness-of-Fit Test to the Normal distribution, after trim- 
ming, DJIA1982 
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We can now perform goodness-of-fit tests, measures of kurtosis and measures 
of fat tails on different models (Gaussian, Neural Network and Dragulescu). 

Empirical Distribution 

All we have to do to obtain the empirical pdf empPDF is to divide each occupation 
number by the bin size Ar and the total number of observations, once bins with 
less than five log-returns have been removed. Then we centre the result (we 
subtract the sample mean to the x axis) to obtain the final probability density 
function empPDF. We plot in Figure 14.31 the probability mass obtained. Since 
the log returns are almost normal, at least according to the classical Bachelier- 
Osborne model, we prefer plotting the PDFs on a semi-logarithmic graph. Thus, 
the slopes of the probability mass should look like straight lines. This has also 
the advantage of focusing on the tails, since on a normal plot any discrepancy 
in the tails looks negligible compared to the discrepancies in the middle of the 
distribution, that form the kurtosis. 



Figure 4.3: Empirical distribution empPDF in normal and logarithmic scale 

On the semi- log plot, a close look at the tails, specially on the left side, 
exhibits a series of points aligned horizontally. These events happened once, 
and constitute a kind of long tail. Indeed, the probability mass (or empirical 
probability density) is bounded, and the inferior limit is simply r — ^ t 

ir J J I 7 J number oj events 

where an event is a specific log return in the time series. We will see that our 
models are all unbounded: they cannot predict those extremely isolated events. 
We will have to pay attention to these long tails in our tests, and not confuse 
them with the fat tails. 
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Gaussian Distribution 

In order to exhibit fat tails and kurtosis, we compute the sample mean and 
standard deviation of the log-returns data, and generate a normal distribution 
normPDF with these parameters. We stress here that as we do not have any 
prior knowledge of the mean and variance of the Gaussian, we have to derive 
them from the empirical log returns. This will have an impact later in our statis- 
tical tests, since we will have to test a composite hypothesis instead of a simple 
hypothesis, usually easier to deal with. 



Figure 4.4: Empirical distribution empPDF vs Gaussian normPDF 

The Gaussian seems to fit the empirical distribution quite well, but a simple 
look at the graph, even if it is often useful, cannot be used as a strong evidence. 
We need statistical tests to measure the goodness-of-fit of the model, as we will 
see later in this Chapter. 

Dragulescu's Distribution 

We can now compute the distribution expected by Dragulescu's model, draguPDF. 
First, we have to set the value of the four parameters of the model. To do so, we 
minimise the mean-square deviation between the model and the empirical data. 
Once we have these values, we can generate draguPDF by integrating between 
finite bounds the expression given in the model. Using finite instead of infinite 
bounds does not seem to modify the results, provided the bound is large enough. 



4.4. COMPARISON OF THE MODELS 



45 




Figure 4.5: Empirical distribution empPDF vs Dragulescu draguPDF 
Neural Network Distribution 

Even if draguPDF is supposed to fit empPDF better than normPDF, we want 
to compare it with the best fit possible, the one obtained with a Neural Network. 
This Neural Network must be as simple as possible, but should fit the main char- 
acteristics of the empirical time series, fat tail and kurtosis. Th structure chosen 
was the following: it is a feed-forward back-propagation network, with a five node 
hidden layer and a single node output layer. The transfer functions are respec- 
tively tansi^f and purelin, where tansigiji) = ^^^-2*n ~1 smd purelin{n) = n. The 
back-propagation function used is trainscg, a network training function that up- 
dates the weight and bias values according to Levenberg-Marquardt optimisation. 
It minimises a combination of squared errors and weights and then determines 
the correct combination so as to produce a network which generalises well. The 
process is called Bayesian regularisation. This structure appears to be a good 
trade-off between the complexity and the goodness of fit. 

We prefer not to complicate the structure, in order to have meaningful statis- 
tical tests: indeed, a model with many parameters will obviously manage to fit 
the data, but the goodness-of-fit will be very poor.^ 

4.4 Comparison of the models 

Now that we have obtained different models, we can compare them. Our aim is 
to verify if the Dragulescu and Yakovenko model fits the empirical data better 
than the classical Gaussian. The Neural Network is used as a benchmark. We 
perform the following tests without re-using or trimming the data. 
^See Chapter 4, Section "x^ Statistic" 
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Figure 4.6: Empirical distribution empPDF vs Neural Network nnPDF 

For each dataset D and for each time lag t, we obtain a set of distributions: 
the empirical distribution empPDF computed from the empirical log-returns, the 
normal distribution normPDF fitted on the empirical log-returns, Dragulescu 
and Yakovenko distribution fitted on empPDF and finally the neural network 
distribution nnPDF fitted on empPDF. 

The first thing to do when comparing different models is to have a close look 
at the empirical and expected cumulative distributions. Even if it can be mislead- 
ing sometimes, this usually gives a good overview of the possible discrepancies 
between the theoretical and observed data. We plot in Figure 14.71 the expected 
and observed cumulative distributions for the index DJIA1982 and a 5 days time 
lag. It seems that Dragulescu and Yakovenko curve fits the empirical distribution 
a bit better than the Gaussian, specially if we look at the tails in Figure 14.81 
Even if it has a very simple structure, the Neural Network seems to be the best 
model, except in the law tail. 

To test the goodness-of-fit of our models, we will first use the Kolmogorov- 
Smirnov Statistic. Mainly because this test poorly handles the fat tails and 
because it can only test a simple hypothesis, we will then perform a goodness- 
of-fit test on equal expected frequency bins. Finally, we will use generated random 
data to focus on the kurtosis of the different models and then study the outliers 
that compose the fat and long tails. 
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Figure 4.7: Cumulative Density function of log returns, together with the theo- 
retical Gaussian (-), Dragulescu (-.) distributions and the Neural Network dis- 
tribution (-), on DJIA1982 dataset, for r = 5 days 

4.4.1 Kolmogorov-Smirnov Statistic 
Introduction 

Dragulescu and Yakovenko claim that their model fits the empirical data of 
DJIA1982 better than the Gaussian for any time lag. To check that, we use 
the Kolmogorov-Smirnov Statistic,^ based on the maximal discrepancy between 
the expected and the observed cumulative distributions, for any log return x. We 
perform this test on the DJIA1982 index, for different time lags. This statistic is 
suitable for testing only a simple hypothesis, for instance a Gaussian with known 
fi and a, but not a composite hypothesis (a class of Gaussians, or a Gaussian 
with n and a derivated from the tested sample dataset itself). Unfortunately, 
whatever the model, we always derive the parameters {fi and a for normPDF, 
7, 6, k and fi for draguPDF, the weights and biases for nnPDF) from the initial 
dataset. 

By performing this test with parameters derived from the dataset, we expect 
the statistic to be large enough to reject the simple hypothesis, and a fortiori the 
^See Appendix B, Section "Kolmogorov-Smirnov Goodness-of-Fit Test" 
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DJIA1 982, time lag = 5 days 




' - " ' 



-0.01 ^ ^ ^ ^ ^ ^ ^ 

-0.08 -0.07 -0.06 -0.05 -0.04 -0.03 

log return x 



Figure 4.8: Zoom around the low tail 



composite hypothesis ( |Bre75j ) . But if the value of Z is small enough to accept the 
simple hypothesis, it does not mean that we can accept the composite hypothesis. 



Methodology 



For each time lag, we compute the log returns dataset, and we divide it into 
paths. For each path and each model, we build the empirical cumulative density 
function empCDF and the expected CDF modelCDF {normCDF, draguCDF or 
nnCDF), and we compute the KS-statistic Z. We present in Tables l4^ 14.91 and 
14.101 the mean Z and standard deviation az of Z over the different paths, and 
the associated p-value^*^ interval p{Z + az) < piZ) < p{Z — az). 
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time lag 


Z 


p- value 


1 


0.131 






2.93e-75 






5 


0.081 ±0.020 


1.75e-09 


< 


2.98e-06 


< 


9.88e-04 


20 


0.089 ±0.013 


9.51e-03 


< 


0.036 


< 


0.112 


40 


0.104 ±0.020 


0.038 


< 


0.122 


< 


0.321 


80 


0.113 ±0.021 


0.199 


< 


0.385 


< 


0.649 


100 


0.113 ±0.021 


0.322 


< 


0.533 


< 


0.778 


200 


0.148 ±0.038 


0.339 


< 


0.630 


< 


0.917 


250 


0.170 ±0.047 


0.291 


< 


0.598 


< 


0.919 



Table 4.8: KS-Test on the Gaussian, DJIA1982 



time lag 


Z 


p- value 


1 


0.109 






1.2e-53 






5 


0.087 ±0.019 


2.08C-10 


< 


3.64e-07 


< 


1.48e-04 


20 


0.089 ±0.014 


8.75e-03 


< 


0.033 


< 


0.104 


40 


0.094 ±0.010 


0.125 


< 


0.211 


< 


0.337 


80 


0.116 ±0.018 


0.197 


< 


0.355 


< 


0.576 


100 


0.128 ±0.019 


0.215 


< 


0.372 


< 


0.585 


200 


0.163 ±0.048 


0.209 


< 


0.512 


< 


0.893 


250 


0.186 ±0.046 


0.224 


< 


0.481 


< 


0.816 



Table 4.9: KS-Test on Dragulescu model, DJIA1982 



time lag 


Z 


p- value 


1 


0.106 






3.27^42 






5 


0.048 ±0.014 


1.64e-03 


< 


0.026 


< 


0.204 


20 


0.047 ±0.009 


0.430 


< 


0.615 


< 


0.778 


40 


0.071 ±0.059 


0.153 


< 


0.746 


< 


1 


80 


0.075 ±0.061 


0.729 


< 


0.919 


< 


0.995 


100 


0.076 ±0.039 


0.532 


< 


0.871 


< 


0.999 


200 


0.116 ±0.034 


0.126 


< 


0.453 


< 


0.932 


250 


0.137 ±0.041 


0.190 


< 


0.502 


< 


0.904 



Table 4.10: KS-Test on the Neural Network, DJIA1982 
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Results 

First, we observe an important variance, over the different paths, in the statistic 
Z: the standard deviation az is not neghgible in comparison with the mean Z. It 
is another evidence that the paths are not equivalent. We had observed a similar 
phenomenon on empirical data^^ when computing their kurtosis. It comes from 
the high heterogeneity of the dataset, which makes our tests less robust. But any 
test performed on this heterogeneous dataset would suffer from the same issue. 
Even though this apparent lack of consistency prevents us from drawing any 
strong and global conclusion, the knowledge of the mean and standard deviation 
Z ± (Tz provides us with a fair overview of the statistic Z. 

On plots, Dragulescu and Yakovenko model seems to fit the empirical cu- 
mulative distribution better than the Gaussian. But in fact, on average, both 
models are rejected for high frequencies (for r = 1 and 5 days) at the 0.01 level 
of significance. Even the Neural Network is rejected for a one day time lag. This 
rejection of the three models may come from the fact that this test is based on 
the maximum discrepancy between the empirical and the theoretical cumulative 
distributions, for any x. To pass this test, a model must fit the observed data 
sufficiently well everywhere, i.e. in the tails (problem of fat tails) and in the 
middle (problem of high kurtosis for high frequencies) of the distribution. Wc 
will test the kurtosis and the tails of the models separately later in this Chapter. 

We point out that even if Dragulescu and Yakovenko model is rejected for 
a one day time lag, the statistic Z is smaller than the Gaussian one (0.109 vs 
0.131), which is an indication that the model fits the data a bit better. For other 
time lags, the p-value arc equivalent: both models are systematically rejected 
for 5 days (p ^ 0.01), sometimes rejected for 20 days {p{Z + az) < 0.01, but 
p{Z — az) > 0.05) and never rejected for higher frequencies {p ^ 0.05). For 
medium and low frequencies, the fact that the simple hypothesis is not rejected 
does not guarantee that the composite hypothesis can be accepted. 

^°The p-value is the probability of observing the given sample result under the assumption 
that the null hypothesis (the tested model) is true. If the p-value is less than the level of 
significance a, then you reject the mill hypothesis. For example, if a = 0.05 and the pnvalue 
is 0.03, then you reject the null hypothesis. The converse is not true. If the p-value is greater 
than a, you have insufficient evidence to reject the null hypothesis. 

^^See Chapter 2, "Measure of kurtosis: Jarque-Bera Test" 
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Conclusion 

The Kolmogorov-Smirnov Goodness-of-Fit Test rejects both the Gaussian and 
Dragulescu and Yakovenko model for high frequencies (r = 1 and 5 days). For 
medium and low frequencies, we cannot conclude because of the theoretical limits 
of this test. To go on investigating, we need a more powerful statistical test that 
can be used even if the parameters of the model are derived from the tested 
dataset itself. The statistic is suitable in those conditions. 

4.4.2 Statistic 
Introduction 

The Goodness-of-Git Test, based on binned data, is a powerful statistical tool 
to test if an empirical distribution comes from a given distribution.^^ Contrary to 
the Kolmogorov-Smirnov test, it is designed to evaluate a composite hypothesis, 
i.e. the parameters of the model can be derivated from the empirical dataset 
tested. This test is a good trade-off between the goodness-of-fit of a model (the 
better fit, the smaller the statistic) and its complexity (the more complex, the 
larger number of parameters m). Indeed, even if a model fits the empirical data 
very well, a too large complexity may penalise its p- value, so that it still can be 
rejected. 

Finally, to be meaningful, this test must be performed using relatively large 
bins, and a critical value of 5 expected observations per bin is regarded as a 
minimum. 

Methodology 

If we perform this test with equal size bins, then the fats tails will be trimmed 
(there are less than 5 expected log returns per bin in tails) and will not participate 
in the value of the statistic, making the the test inaccurate. Instead, we split the 
log return axis into equal expected frequency bins, so that all of the log returns 
participate in the value of the statistic. We use an expected frequency of 5 log 
returns per bin. 

Unfortunately, this test cannot be performed for large time lags, because of 
the lack of data. Indeed, in the DJIA1982 index for instance, we have initially 
^^See Appendix B, Section "Chi-Square Goodness-of-Fit Test" 
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around 5000 close prices, which means that for a time lag of 250 days, each path 
will have only about 20 log returns. In those conditions, because of the critical 
value of 5 log returns per bin, we will have finally only 4 bins, which is too small 
to perform a relevant test. 

Results 

We present our results of the Goodness-of-Fit Test in Tables 14. IH 14.121 and 
14.131 The degree of freedom is given by df = noBins — 1 — m, where m is the 
number of parameters of the model (m = 2 for the Gaussian, m = 4 for Drag- 
ulescu and Yakovenko model, and m = 11 for the Neural Networks if we count 
the weights and the biases). For large time lags, df becomes smaller and smaller 
because noBins decreases, as explained above. 

Concerning the Neural Network, we cannot perform this test for time lags 
higher than 40 days, or else the degree of freedom decreases to zero. This is due 
to the relatively high number of parameters (m = 11). With a structure even 
more complicated, we could not have performed the test at all, except for high 
frequencies. 

First we notice that the Neural Network's statistic is slightly smaller than 
Dragulescu's one, itself smaller than the Gaussian's one, for all paths with a time 
lag from r = 1 to r = 80 days. It means that the Neural Networks fits empirical 
data better than Dragulescu model, which itself is better than the Gaussian. But 
there is a price to pay, in terms of complexity: due to too many parameters (and 
then a lower degree of freedom), the p- value of the Neural Networks and Drag- 
ulescu model are not systematically higher than the p- value of the Gaussian. And 
it is precisely the p-value that is used to accept or reject a model, not directly 
the statistic. 

If we look at the p-value in detail, we observe that 

• For T = 1, the three models are rejected at a 0.05 level of significance 

• For r = 5, only the Neural Network is systematically accepted. The 
Gaussian and Dragulescu model are only accepted in the best situation 
(p(x2) < 0.05 < p{x^ - (7^2)) 
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time lag 




df 


p- value 


1 


1790 




1010 






6.29e-ll 






5 


255 


±30 


198 


5.38e-05 


< 


4.07e-03 


< 


0.0931 


20 


61 


±12 


47 


7.99C-03 


< 


0.0819 


< 


0.409 


40 


29.1 


±7.0 


22 


0.0295 


< 


0.141 


< 


0.451 


80 


10.4 


±4.6 


9 


0.0915 


< 


0.32 


< 


0.76 



Table 4.11: Test on the Gaussian, DJIA1982 



time lag 




df 


p- value 


1 


1420 




1000 






1.16e-04 






5 


244 


±26 


196 


332e-04 


< 


0.0108 


< 


0.133 


20 


48.5 


±11.5 


45 


0.0663 


< 


0.333 


< 


0.796 


40 


27.3 


±6.1 


20 


0.0301 


< 


0.126 


< 


0.385 


80 


9.7 


±4.4 


7 


0.049 


< 


0.206 


< 


0.624 



Table 4.12: Test on Dragulescu model, DJIA1982 



time lag 




df 


p- value 


1 


2230 




997 






0.0839 






5 


232 


±38 


189 


0.0559 


< 


0.346 


< 


0.817 


20 


45.9 


±11.1 


38 


0.0473 


< 


0.25 


< 


0.688 


40 


21.5 


±6.3 


13 


0.0057 


< 


0.0552 


< 


0.333 


80 


7.6 


±6.3 





NaN 


< 


NaN 


< 


NaN 



Table 4.13: Test on the Neural Network, DJIA1982 
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• For r = 20, the three models are accepted and Dragulescu model is better 
than the Neural Network and the Gaussian 

• For r = 40 and r = 80, Dragulescu model is still accepted, but its p-value 
is smaller than the one of the Gaussian 

Conclusion 

Thanks to the Goodness-of-Fit Test, we can assert that Dragulescu and 
Yakovenko model fits empirical data slightly better than the Gaussian, for high 
and medium frequencies. Nevertheless, both models are rejected for high frequen- 
cies (r = 1 and 5 days), at a 0.05 level of significance. In this sense, these results 
are consistent with the Kolmogorov-Smirnov Goodness-of-Fit Test. 

We also observe a clear shift in the goodness-of-fit of the models around r = 40 
days: the probability of accepting the Gaussian becomes larger than the probabil- 
ity to accept Dragulescu model (and even the Neural Network) due to the lower 
complexity of the Normal model (two parameters instead of four and eleven re- 
spectively) . 

To put it in a nutshell, using a complex model, such as the Dragulescu model 
or a Neural Network, is only worth for r = 1, 5 and 20 days. For lower frequencies 
(r > 40 days), the Gaussian is preferable because it is simpler. Given that for 
these frequencies, we had observed neither fat tails nor kurtosis in the empirical 
datasets, the Gaussian represents the best trade-off between goodness-of-fit and 
complexity. 

4.4.3 Measure of kurtosis 
Introduction 

As attested by Figures 14.71 and 14.81 and by the results of the Goodness-of- 
Fit Test, the Dragulescu and Yakovenko model fits their empirical data slightly 
better than the Gaussian, for any time lag. Nevertheless, both models are rejected 
for high frequencies, characterised by prominent fat tails and high kurtosis, as 
exposed in Section 2. Hence, we should try to find out if the models are rejected 
mainly because of fat tails, kurtosis, or both. First, let us have a look at the 
kurtosis, as it is easy to test. We will concentrate on the tails in the next Section. 
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Methodology 

We perform our tests on the DJIA1982 index, without reusing or trimming the 
data. For each time lag, we compute the log returns dataset, and we divide it 
into paths. For each path i, we start by computing the observed kurtosis, exactly 
as we did in Chapter 2. Then, for each model, we build the PDF {empPDF, 
normPDF, draguPDF or nnPDF, where empPDF is the empirical PDF), and 
use it to generate random data, i.e. plausible log returns time series. More pre- 
cisely, we generate = 100 random datasets of noLog Returns elements, where 
noLogReturns is the number of log returns in the initial paths from which we 
derivated the model distribution. Finally, we compute the kurtosis of these time 
series and obtain, for each path i, a mean value ki and a standard deviation (Tr- 
over the N simulations. 

We already know that the empirical data exhibit kurtosis mainly for r = 
land 5 days, and that even for those time lags, they do not exhibit kurtosis 
consistently for each path^^. As a consequence, we expect a good model to 
produce kurtosis only when the empirical data does. For each path, we give 
in Tables l4'.4.3l and (4.4.31 the average kurtosis fcj, and its standard deviation 0"^-, 
produced by the A^ simulations. 

We expect almost the same results as in Table 12.11 for empPDF, no kurtosis 
for normPDF, and similar kurtosis as in Table 12.11 for draguPDF and nnPDF, 
since these models are supposed to fit the data sufficiently well. 

Results 

observed from empPDF from normPDF from draguPDF from nnPDF 
69.27 22.21 ± 20.48 -0.01 ±0.07 107.69 ± 16.66 30.55 ± 24.9964 

Table 4.14: Kurtosis of generated data, DJIA1982, r = 1 day 

The variance in the kurtosis is extremely important, for observed data and 
for generated random data. Because the aim of this thesis is to compare a model 
with observed data, the analysis of the origin itself of this variance is out of our 
scope. Actually, we are only interested here in the capacity of different models 
to reproduce or not a high kurtosis, when it is exhibited by observed data. 
"'^^See Chapter 2, Section "Measure of kurtosis: Jarque-Bera Test" 
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observed from empPDF from normPDF from draguPDF from nnPDF 

38.20 14.87 ± 13.35 0.07 ± 0.20 2.35 ± 1.99 28.43 ± 13.12 

30.49 10.06 ± 10.17 0.11 ± 0.24 3.38 ± 4.40 14.74 ± 10.72 

5.91 4.12 ± 1.76 -0.03 ± 0.14 1.70 ± 0.68 5.10 ± 2.31 

5.77 3.13 ± 1.97 -0.05 ± 0.14 1.80 ± 1.03 1.63 ± 0.44 

3.99 3.08 ± 1.30 0.03 ± 0.17 2.04 ± 0.59 3.08 ± 1.09 



Table 4.15: Kurtosis of generated data, DJIA1982, r = 5 day 

As expected, the Gaussian never exhibits kurtosis, since by definition the 
kurtosis is a departure from normahty. Moreover, the simulated time series gen- 
erated from the empirical PDF empPDF and the Neural Network nnPDF ex- 
hibit high kurtosis in accordance with observed data, even if their kurtosis is in 
general smaller (22.21 ± 20.48 < 69.27 for r = 1, 14.87 ± 13.35 < 38.20 and 
10.06 ± 10.17 < 30.49 for r = 5). 

To the opposite, random data generated from draguPDF exhibit a very high 
kurtosis for a one day time lag, even higher than expected (107.69 ± 16.66 > 
69.27), as attested by the high peakedness of the distribution in Figure 14.91 
Moreover, for a five day time lag, this model clearly fails to produce kurtosis, 
whatever the path, as shown by column (4) of Table 14.4.31 

DJIA1982, time lag = 1 days 
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Figure 4.9: Empirical and expected PDFs, r = 1 day 
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This could explain why Dragulescu and Yakovenko model is rejected by both 
the Kolmogorov-Smirnov and the goodness-of-fit tests for high frequencies, 
whereas the Neural Network, for instance, is not rejected for a five day time lag. 

Conclusion 

Our conclusion is that the Dragulescu and Yakovenko model docs not handle 
correctly the kurtosis: it exhibits too high kurtosis for a one day time lag, and 
not enough for a five day time lag. In this sens, although it is better than the 
Gaussian because at least, it can produce some kurtosis, this model is still rejected 
for high frequencies. In terms of plot, it is generated by a too large probability 
mass in the centre and in the tails of the distribution, which is translated by 
an important peakedness and by fat tails. We will focus on the fat tails in next 
Section. 

4.4.4 Measure of the fat tails 
Introduction 

Fat tails are very difficult to handle, because they correspond to exceptional 
events, statistically not significant, but terribly important for stock markets. In- 
deed, they arc caused by crashes and bubbles that happen far more often than 
predicted by the Gaussian. By trimming the data, Dragulescu and Yakovenko 
removed some of them, which explains why their plots look so smooth. Using 
equal expected frequency bins in the goodness-of-fit test was the only way to 
keep them. We will investigate in this Section these extreme events and try to 
capture them as precisely as possible. 

Methodology 

In 1962, E. Fama defended Mandelbrot's stable-Paretians against the Gaussian 
specially because stable-Paretians could produce fat tails. He was the first to 
exhibit clearly those fat tails, and used a simple test (among others) to do so. 
The idea is to count the number of outliers, viz. the number of log returns out- 
side II ± 2(7, II ± 3a and n ± Aa. If the log returns really followed a normal 
distribution, then the number of outliers should be respectively noLogReturns * 
0.0455, noLogReturns*0. 0027 and noLogReturns*0.0000063, where noLogReturns 
is the total number of log returns in the given path. If we compare this expected 



4.4. COMPARISON OF THE MODELS 



58 



value with the real number of outliers for normPDF, then we should capture the 
fat tails. In Tables E.161 14.171 and 14.151 we indicate, for a given level of deviation 
(/i ± 2, 3, and 4cr) , the expected number of outliers if the log returns were nor- 
mally distributed in Column (2) and the observed (regarding normPDF) number 
of outliers in Column (3). For instance, for a one day time lag, if the log returns 
followed a normal distribution, we would expect around 13.63 out of 5049 log 
returns outside the boundaries /x ± 3(t. We observed, however, 50 outliers (i.e. 
observations outside three standard deviations of normPDF). It indicates that 
the Gaussian dramatically underestimates the number of outliers. 

Results 

In Table l^?TBl the real number of outliers is systematically inferior to the expected 
number: it means that we don't have fat tails at a level of 2cr. In fact the tails are 
not fatter than expected by a Gaussian. But for high frequencies (r = 1 and 5 
days), fat tails appear at a level of 3cr (50 ^ 13.63,7 > 2.72) and 4cr (22 ^ 
0.32,5 ^ 0.06). For medium frequencies, the expected number of outliers is 
inferior to one, and the real is around one or two: these log returns correspond to 
extremely rare events, that appear very far from the mean (more than 4 standard 
deviations!); we classify them as long tails. 

To summarise, for high frequencies, the Gaussian exhibits fat tails outside 
/i ± 3(7 and long tails after /i ± 4a. Fat tails correspond to crashes (bubbles) and 
occur far more often than predicted, whereas long tails correspond to exceptional 
huge crashes (resp. huge bubbles). For medium frequencies, the Gaussian exhibits 
long tails at /i ± 3a, but no fat tails. Finally, for low frequencies, the Gaussian 
exhibits neither fat nor long tails. 

Unfortunately, the tails are not as easy to isolate statistically for the other 
models, Dragulescu and the Neural Network. Nevertheless, given that both these 
models outperform the Gaussian, we expect them to fit the tails a bit better. We 
can verify that by a mere observation of the plots, for instance the left tails (that 
corresponds to crashes) of the CDFs of the different models, in Figure E. 101 

We clearly see on this plot the poor fit of the Gaussian, the slightly better 
fit of the Dragulescu and Yakovenko model, and the very good fit of the Neural 
Network. But we have to keep in mind that the complexity of those models is 
greater as well, which explains why the Gaussian is a still preferable for medium 
and low frequencies. For five days, this phenomenon still subsists, but is less 
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time lag 


expected 


1 


229.72 


5 


45.90 


20 


11.42 


40 


5.68 


80 


2.82 


100 


2.23 


200 


1.09 


250 


0.86 



observed in normPDF 
205 
30 
12 
5 
2 
2 
2 
1 



Table 4.16: Expected and observed number of outliers in fat tails, out of ± 2(7 



time lag 


expected 


observed in i 


1 


13.63 


50 


5 


2.72 


7 


20 


0.67 


2 


40 


0.33 


1 


80 


0.16 


1 


100 


0.13 


1 


200 


0.06 





250 


0.05 






Table 4.17: Expected and observed number of outliers in fat tails, out of ± 3(7 



time lag 


expected 


observed in n 


1 


0.3181 


22 


5 


0.0636 


5 


20 


0.0158 


2 


40 


0.0079 


1 


80 


0.0039 





100 


0.0031 


1 


200 


0.0015 





250 


0.0012 






Table 4.18: Expected and observed number of outliers in fat tails, out of ± 4(7 
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DJIA1982, time lag = 1 days 
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Figure 4.10: Empirical and expected CDFs, r = 1 day 



DJIA1982, time lag = 5 days 
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Figure 4.11: Empirical and expected CDFs, r = 5 day 
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DJIA1982, time iag = 20 days 
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Figure 4.12: Empirical and expected CDFs, r = 20 day 

prominent ()4.1H) . Finally, by a way of comparison, we plot the same figure for 
medium frequencies (r = 20 days), where all of the models are accepted ()4.12|) . We 
can see that gradually, the difference of fit between the Draguiescu and Yakovenko 
distribution and the Gaussian becomes smaller and smaller. 

4.5 Conclusion 

We performed in this Chapter many tests on the Draguiescu and Yakovenko 
model, claimed to outperform the Gaussian distribution (Bachelier- Osborne model) 
for any time lag. We used a simple Neural Network as a benchmark. Our first con- 
clusion was that the data should be neither re-used nor trimmed, as the authors 
do, because it makes them so smooth that any model could fit them sufficiently 
well to be accepted, even the Gaussian itself. Then, thanks to two goodness-of-fit 
tests, based on the Kolmogorov-Smirnov and the Chi-Square Statistics, we could 
reject Draguiescu and Yakovenko model for high frequencies (r = 1 and 5 days), 
mainly because it is unable to produce the correct kurtosis, even if the fit of fats 
tails is improved. 

For medium and low frequencies, the fit of the model is better than the Gaus- 
sian, but the price to pay in terms of complexity is too high (we introduced four 
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parameters instead of two), so that finally the Normal distribution appears to be 
the best trade-off between goodness-of-fit and complexity. 



Chapter 5 

Conclusions and further work 



5.1 Conclusions 

The goal of this research was to evaluate quantitatively a new model of stock 
market returns, based on a stochastic volatility. 

To accomplish such task, we introduced, during early stages, the notion of 
a Random Walk, followed by an overview of the Geometrical Brownian Motion, 
the mathematical theory underlying the statistical models of stock markets, and 
in particular the classical Bachelier-Osborne model, according to which the log 
returns follow a Normal distribution (a Gaussian). 

At this point, two principal departures from this model were identified and 
analysed: fat tails and kurtosis, exhibited by empirical data mainly for high fre- 
quencies (time lag r = 1 and 5 days). 

This evaluation was followed by a description of the new model, published 
in March 2002 by Dragulescu and Yakovenko |DYj . and claimed to outperform 
the Gaussian for any time lag. Statistical tests, such as the Jarque-Bera test 
and Lilliefors test, and goodness-of-fit tests, based on the Kolmogorov-Smirnov 
statistic and the statistic, were then performed. 

First, we demonstrated that the way Dragulescu and Yakovenko trim and 
reuse their dataset is unfair, and we proposed our own methodology, based on 
the conservation of all the data. 
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Then, we found out that their model effectively fits the empirical data slightly 
better than the Gaussian for any time lag. Nevertheless, the Gaussian is prefer- 
able to any more complicated model (Dragulescu model or a Neural Network) 
for medium and low frequencies (time lag r > 40 days), essentially because em- 
pirical data exhibit neither fat tails nor kurtosis for these frequencies. Hence, the 
Gaussian represents the best trade-off between goodness-of-fit and complexity. 

Finally, we tried to investigate why both models were rejected for high frequen- 
cies (time lag r = 1 and 5 days) at a 0.05 level of significance, and concentrated 
first on the kurtosis and then on the fat tails. 

5.2 Further work 

All along this research, we realised that none of the statistical model could han- 
dle the extreme events, huge crashes or bubbles, that generate long tails, beyond 
the fat tails. We reach here one of the limits of a statistical approach, where 
the expected probability density is unbounded, contrary to the observed one. If 
used in risk management or option pricing, those models, to be complete, should 
absolutely be coupled with ad hoc rules for extreme events. Indeed, people tend 
to overweight outcomes that are considered certain, which lead Dragulescu and 
Yakovenko to trim the long tails, considered as too rare to deserve attention. This 
is known as the certainty effect, described by D. Kahneman and A. Tversky in 
their Prospect Theory about decision under risk |KT79] . And just like the classical 
Expected Utility Theory cannot handle this certainty effect, classical statistical 
models cannot handle the phenomenon of long tails. 

This leads us to a second possible improvement: with the recent amazing 
expansion of Artificial Intelligence techniques, a new class of models, called "agent 
models" appeared. Those models are based on the representation of clusters 
of traders who can communicate and interact with each others, following very 
simple rules. Here, the traditional i.i.d. hypothesis for stock returns is rejected, 
and replaced by basic assumptions about the inter-dependence of traders. Those 
models, [CBOOtlEdmj . have a strong explanatory power, contrary to the statistical 
ones, and can handle very extreme events that constitute the long tails. We might 
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investigate these models very soon, as a PhD student. 
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Appendix A 

Elements of stochastic calculus 



Markov processes 

The stochastic processes we consider here are random vectors that depend on 
time t and take their value in JJ"^: 

X{t) e W 

where X represents a. n*l vector. We suppose that 2L{t) represents the state of 
the world at time t, i.e. contains all of the characteristics of the market (price, 
mean, volatility, etc.). 

Let us consider a process 2L(t), a series of successive instants to, ti, tm-i 
in [0, T], real vectors Xq, Xi, ... and x, and the probability that 2L{tm) is 

inferior or equal to x given that 2L{to) — Xq, 2L{ti) = Xi, ...,2L{tm-i) = x^-i, this 
conditional probability is written: 

P{X{trr,) < X I X(to) = Xo, X(ti) = X„ ...,X(t„.-l) = X„,^i) 

In a Markov process, by definition, the conditional probabilities of Xft^n) 
depend only on {tm-i, x^-i), whatever the series {to, ti, tm) and x are. To 
sum up, given the present, the future is conditionally independent of the past. 

The conditional probability above is then simplified: 

P{X{tJ <x\X{t 

This definition is valid in two distinct situations: the state of the system can 
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be defined for discrete values of the time i , or it can be defined for any time 
in a given interval (Vt G [0,T]). Processes of the first type are called "discrete", 
whereas processes of the second type are called "processes in continuous time". 

Processes in continuous time can be split in two categories: "continuous pro- 
cesses" and "jump processes". 

In continuous processes, also called "diffusion processes" or "Ito processes", 
only infinitesimal variations of 2L are possible during the time interval dt, whereas 
jump processes are characterised by some discontinuities. 

Many models have been developed to study the time series of the price, mean, 
volatility, etc. of securities. Continuous and jump processes have been used 
frequently. 

Brownian motion 

The theory of diffusion processes has been developed by mathematicians and 
physicists at the end of XVIII^^ century. A diffusion process can be seen as 
the limit of a discrete Markov process. This vision enables us to highlight the 
different properties of such processes and to justify there use in Financial Market 
models. First, let us have a look at a specific diffusion process: the Brownian 
motion. 

The description "Brownian motion" comes from the fact that the same pro- 
cess describes the physical motion of a particle subject to random shocks, a phe- 
nomenon first noted by the British physicist Brown in 1828, observing irregular 
movement of pollen suspended in water. 

Consider a unidimensional discrete Markov process observed at different times, 
with values X{0),X{1), ...,X{t), characterised by independent and identically 
distributed (i.i.d.) increments of mean /i and standard deviation a. This particu- 
lar process, called " Brownian motion" , is used in Physics to describe the motion 
of a particle in suspension, and in Financial Markets, the evolution of security 
yields. 

^^^^ ^ X(t + 1) - X(t) - 

(7 

Then < T, E[N{t)] = 0, Var[N{t)] = 1 and N{t) are i.i.d. X follows the 
equation 
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X{t + 1) = X{t) + fi + aN{t) 

Let us consider now the partition of the interval [t, t + 1] constituted of n 
sub-intervals of length / = 1/n [t+ {i — l)/,t + i/]Vi = 1, ...,n. Let us assume that 
the variation X{t + 1) — X{t) results in the sum of the n elementary variations 
AjX of each sub-interval: 

n n 

X(i + 1) - = ^ X(i + iO - ^(^ + - 1)0 = E 

i=l i=l 

To conclude, let us assume that the elementary variations X{t + il) — X{t + 
{i — 1)1) = AiX are i.i.d. themselves, with mean A and standard-deviation v. 



AiX = A + vU{i) 

where E[{U{{)] = 
Var[U{i)] = 1 
U{i) are i.i.d. 

Then: 

• E[X{t + 1) - X{t)] = EILi E{AiX) = nA = so A = ^ = /x/ 

• Var[X{t + 1) - X{t)] = Er=i Var{AiX) ^ nv'^ ^ so v ^ ^ ^ aVl 
since v >0 and a >0 

As a consequence: 



AiX = X{t + il) - X{t 1)1) = + aVlU{i) (A.l) 

This means that: 

• the increments of X have a mean and variance constant by unit of time 
equal to fi and respectively. Mean and variance of X{t) — X{s) are then 
proportional to t — s,\/t < s 

• to the limit, if X is a continuous Markov process, X{t) is normally dis- 
tributed. Indeed, when the number n of sub-intervals tends to infinity: 

n 

X{t + 1) - X{t) = limn-.+oo E 

i=l 
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Given that the AjX are i.i.d., the central hmit theorem (CLT) states that 
the increments X{t + 1) — X{t) are normally distributed. Furthermore, if 
we consider that the elementary variations AjX are themselves made of an 
infinite number of i.i.d. variations on even smaller intervals, then the AjX 
are normally distributed themselves. 

Wiener processes 

The first mathematically rigorous construction of Brownian motion was carried 
out by Wiener in 1923. 

In the limit. takes the form of the following differential equation: 



where dW = U{t)Vt 

U(t) standard normal, independent of U(t') for t 7^ t' 
/i and cr^ are called respectively the instantaneous expectation (or " drift" ) and 
instantaneous variance of X. The process W, which increments are independent 
and normally distributed with a null expectation and an instantaneous variance 
of 1, is called a "Wiener process" or a "standardised Brownian motion". The 
trajectories of the Brownian motion X are: 

• continuous 

• not derivable nearly surely 

Hence, trajectories of X are continuous but characterised by a change of slope 
at each time. Moreover, the process is stationary. The properties of the Wiener 
processed are described in |PH96j 

The Brownian motion is a process whose increments are i.i.d., following a 
Gaussian distribution with constant instantaneous expectation and variance. It 
can be used when the motion of a system results of a constant strength that 
imposes a drift (fidt) associated with a succession of random and independent 
shocks (adW) that impose erratic motions. 

We can generalise IA.2I easilv to a multidimensional vector X(t): 



dX = jj^dt + adW 



(A.2) 



dX = yAt + adW_{t) 



(A.3) 
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where /i is a constant vector, /i G 3?" 

]y is a Wiener process of m independent elements (m < n) 
a is a nxm matrix of constant elements 

Ito processes 

The integral with respect to Brownian motion was developed by Ito in 1944. 

The Brownian motion described above is very particular, specially because 
instantaneous expectation and variance (/i, a) are supposed to be constant. 

Let's extend lA.3l to the case where fi and a are not constant but depend on 
the time t and the value of X: 

dX{t) = Mt, X{t))dt + g{t, X{t))dW{t) (A.4) 
If a solution to I A. 41 exists, then it should take the form: 

X{t) = X{s) + f fi{r, X{r))dr + f ^(r, X{r))dW{.r) (A.5) 

J s J s 

where s<t is the current instant 

If fJ,(t, 2L(t)) and£(t,X(t)) respect two conditions, called Ito's conditions, then 
lA. 51 admits a unique solution 2L{t), and this solution is a Markov process. 
Ito's conditions are: 

Vt G [0, T], a; G Q, where Q is an open of 3?", 3 two constants C and K such 

as: 

• II li{t,x) \\< C(l+ II X II); II g{t,x) \\< C(l+ II X II) 

• \\ ii{t,x)~ fi{t,y) ||<i^(|U|| - ||y II); \\ g{t,x) - g{t,y) ||< i^^dU || - || 

y\\) 

Stochastic processes that obev IA.4l and whose instantaneous expectation and 
variance respect Ito's conditions are called "diffusion processes" or "Ito's pro- 
cesses" . 

Ito's lemma 

Let's consider a unidimensional diffusion process X(t): 
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dX = fi{t, X)dt + (T(t, X)dW 

where X) and a{t,X) respect Ito's conditions and is a Wiener. 

Let f be a function from 3?^ to 3?, once continuously derivable with respect to 
t and twice continuously derivable with respect to X: 



{t,X)^f{t,X) 



Ito's lemma is: 



Equation I A. 61 is very important since it is the basis of the differential calculus 
of stochastic functions. The main difference with usual differential calculus is the 
presence of the term ^^{dX)'^. 



Appendix B 
Elements of statistics 



Central Limit Theorem (CLT) 



The central limit theorem considers a series of random variables Xi, X2, .... X„ 
independent and identically distributed ("i.i.d.") with finite mean /j, and variance 
a, and states that: 



This capital theorem indicates that the sum of a large number of independent 
events is approximatively normal. In other words, the distribution of an average 
will tend to be normal as the sample size n increases, regardless of the distribution 
from which the average is taken (the parent distribution), except when the central 
moments of the parent distribution do not exist (viz. are not finite). 

Kurtosis 

The degree of peakedness of a distribution, also called the "excess" or "excess 
coefficient." Kurtosis is a normalised form of the fourth central moment of a 
distribution. There are several types of kurtosis commonly encountered, including 
Fisher kurtosis (denoted 72 and also known as the kurtosis excess) 




- 



.-+00 A^(0, 1) 



n 
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and Pearson kurtosis (denoted or P2) 

Here, //j denotes the ith central moment 

II, = E[{X - liY] 

where jJL is the mean of the distribution. 

If not specifically qualified, then term "kurtosis" is generally taken to refer to 
Fisher kurtosis. A distribution with a high peak is called leptokurtic, a flat-topped 
curve is called platykurtic, and the normal distribution is called mesokurtic. 

Normal Probability Plot 

The normal probability plot is a graphical technique for assessing whether or not 
a data set is approximately normally distributed. The data are plotted against 
a theoretical normal distribution in such a way that the points should form an 
approximate straight line. Departures from this straight line indicate departures 
from normality. 

The normal probability plot is formed by: 

• Vertical axis: Ordered response values 

• Horizontal axis: Normal order statistic medians 

The observations are plotted as a function of the corresponding normal order 
statistic medians which are defined as: 

N{t) = G{U{i)) 

where U (i) are the uniform order statistic medians (defined below) and G is the 
percent point function of the normal distribution. The percent point function is 
the inverse of the cumulative distribution function (probability that x is less than 
or equal to some value). That is, given a probability, we want the corresponding 
X of the cumulative distribution function. 




APPENDIX B. ELEMENTS OF STATISTICS 



76 



The uniform order statistic medians are defined as: 



m(l) 

m{i) 
m{n) 



1 — m{n) 

{i - 0.3175)/(n + 0.365) for i = 2, 3, n - 1 

0.5(l/n) 



In addition, a straight fine can be fitted to the points and added as a reference fine. 
The further the points vary from this fine, the greater the indication of departures 
from normahty. The correlation coefficient of the points on the normal probability 
plot can be compared to a table of critical values to provide a formal test of the 
hypothesis that the data come from a normal distribution. 

The underlying assumptions for a measurement process are that the data 
should behave like: 

• random drawings 

• from a fixed distribution 

• with fixed location 

• with fixed scale 

Probability plots are used to assess the assumption of a fixed distribution. In 
particular, most statistical models are of the form: 

response = deterministic + random 
where the deterministic part is the fit and the random part is error. This er- 
ror component in most common statistical models is specifically assumed to be 
normally distributed with fixed location and scale. This is the most frequent 
application of normal probability plots. That is, a model is fit and a normal 
probabihty plot is generated for the residuals from the fitted model. If the resid- 
uals from the fitted model are not normally distributed, then one of the major 
assumptions of the model has been violated. 



Statistical tests terminology 



Here are a few definitions of the terminology used in our statistical tests: 
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• The null hypothesis HO is the original assertion. For instance, HO = The 
empirical points follow a normal distribution. The null hypothesis is al- 
ways tested against an alternative hypothesis HI. For instance, HI = The 
empirical points do not follow a normal distribution 

• The significance level is related to the degree of certainty you require in 
order to reject the null hypothesis in favour of the alternative. By taking a 
small sample you cannot be certain about your conclusion. So you decide 
in advance to reject the null hypothesis if the probability of observing your 
sampled result is less than the significance level. For a typical significance 
level of 5%, the notation is a = 0.05. For this significance level, the proba- 
bility of incorrectly rejecting the null hypothesis when it is actually true is 
5%. If you need more protection from this error, then choose a lower value 
of a. 

• The p-value is the probability of observing the given sample result under 
the assumption that the null hypothesis is true. If the p-value is less than 
a, then you reject the null hypothesis. For example, if a = 0.05 and the p- 
value is 0.03, then you reject the null hypothesis. The converse is not true. 
If the p-value is greater than a, you have insufficient evidence to reject the 
null hypothesis. 

Kolmogorov-Smirnov Goodness-of-Fit Test 

The Kolmogorov-Smirnov test (K-S, |LR67p . is used to decide whether a sample 
comes from a population with a specific distribution. The K-S test is based on 
the empirical cumulative distribution function (ECDF). 

The empirical distribution function is compared with the model cumulative 
distribution function. The K-S test is based on the maximum distance between 
these two curves. It tests a simple hypothesis, which means that the parameters 
of the expected distribution must not be derived from the empirical data, but 
must be specified in advance. 

The Kolmogorov-Smirnov test is defined by: 

• HO: The data follow a specified distribution 

• Ha: The data do not follow the specified distribution 
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• Test Statistic: The Kolmogorov-Smirnov test statistic is defined as 

D = max I FCVi) - — I 

where F is the theoretical cumulative distribution of the distribution being 
tested which must be a continuous distribution (i.e., no discrete distribu- 
tions such as the binomial or Poisson), and it must be fully specified (i.e., 
the location, scale, and shape parameters cannot be estimated from the 
data) . 

• Significance Level: a 

• Critical Values: The hypothesis regarding the distributional form is rejected 
if the test statistic, D, is greater than the critical value obtained from a 
table. 

An attractive feature of this test is that the distribution of the K-S test statis- 
tic itself does not depend on the underlying cumulative distribution function being 
tested. Another advantage is that it is an exact test (the chi-squarc goodncss-of- 
fit test depends on an adequate sample size for the approximations to be valid). 
Despite these advantages, the K-S test has several important limitations: 

1. It only applies to continuous distributions. 

2. It tends to be more sensitive near the center of the distribution than at the 
tails. 

3. Perhaps the most serious limitation is that the distribution must be fully 
specified. That is, if location, scale, and shape parameters are estimated 
from the data, the critical region of the K-S test is no longer valid. It 
typically must be determined by simulation. 



Chi-Square Goodness-of-Fit Test 

The chi-square test is used to test if a sample of data came from a population 
with a specific distribution. 

An attractive feature of the chi-squarc goodness-of-fit test is that it can be 
applied to any univariate distribution for which you can calculate the cumulative 
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distribution function. The chi-square goodness-of-fit test is applied to binned 
data (i.e., data put into classes). 

The chi-square test is an alternative to the Anderson-Darling and Kolmogorov- 
Smirnov goodness-of-fit tests. The chi-square goodness-of-fit test can be apphed 
to discrete distributions such as the binomial and the Poisson. The Kolmogorov- 
Smirnov and Anderson-Darling tests are restricted to continuous distributions. 

The test statistic follows, approximately, a chi-square distribution with k — 
c degrees of freedom where k is the number of non-empty cells and c = the 
number of estimated parameters (including location and scale parameters and 
shape parameters) for the distribution -|- 1. For example, for a 3-parameter 
WeibuU distribution, c — A. 

Therefore, the hypothesis that the data are from a population with the speci- 
fied distribution is rejected if > X^(o;, k — c), where x^(q;, k — c) is the chi-square 
percent point function with k — c degrees of freedom and a significance level of a. 

Jarque-Bera Goodness-of-Fit Test 

The Bera-Jarque test is a parametric hypothesis test of composite normality. It 
determines if the null hypothesis of composite normality is a reasonable assump- 
tion regarding the population distribution of the observed data X, at a given 
significance level a. 

The Bera-Jarque hypotheses are: 

• Null Hypothesis: X is normal with unspecified mean and variance. 

• Alternative Hypothesis: X is not normally distributed. 

The Bera-Jarque test is a 2-sided test of composite normahty with sample 
mean and sample variance used as estimates of the population mean and vari- 
ance, respectively. The test statistic is based on estimates of the sample skewness 
and kurtosis of the normalised data (the standardised Z-scores computed from 
X by subtracting the sample mean and normalising by the sample standard de- 
viation). Under the null hypothesis, the standardised 3rd and 4th moments are 
asymptotically normal and independent, and the test statistic has a Chi-square 
distribution with two degrees of freedom. Note that the Bera-Jarque test is an 
asymptotic test, and care should be taken with small sample sizes. 
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Lilliefors Goodness-of-Fit Test 

The Lilliefors test for goodness of fit to a normal distribution. It evaluates the 
hypothesis that observed data X have a normal distribution with unspecified mean 
and variance, against the alternative that X do not have a normal distribution. 
This test compares the empirical distribution of X with a normal distribution 
having the same mean and variance as X. It is similar to the Kolmogorov-Smirnov 
test, but it adjusts for the fact that the parameters of the normal distribution 
are estimated from X rather than specified in advance. Thus, it determines if the 
null hypothesis of composite normality is a reasonable assumption regarding the 
population distribution of the observed data X. 

Let S(x) be the empirical c.d.f. estimated from the sample vector X, F{x) be 
the corresponding true (but unknown) population c.d.f., and CDF be a normal 
c.d.f. with sample mean and standard deviation taken from X. The Lilliefors 
hypotheses and test statistic are: 

• Null Hypothesis: F{x) is normal with unspecified mean and variance 

• Alternative Hypothesis: F{x) is not normally distributed. 

• Test Statistic: T = max\S{x) — CDF\ 

The decision to reject the null hypothesis occurs when the test statistic exceeds 
the critical value. 

Maximum Likelihood Ratio Test 

Let LI be the maximum value of the likelihood of the data without the addi- 
tional assumption. In other words, LI is the likelihood of the data with all the 
parameters unrestricted and maximum likelihood estimates substituted for these 
parameters. 

Let LO be the maximum value of the likelihood when the parameters are re- 
stricted (and reduced in number) based on the assumption. Assume k parameters 
were lost (i.e., LO has k less parameters than LI). 

Form the ratio X — LO/Ll. This ratio is always between and 1 and the less 
likely the assumption is, the smaller A will be. This can be quantified at a given 
confidence level as follows: 
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1. Calculate = —2 In A. The smaller A is, the larger will be. 

2. We can tell when is significantly large by comparing it to the upper 
100 * {1 — a) percentile point of a Chi Square distribution with k degrees 
of freedom, has an approximate Chi- Square distribution with k degrees 
of freedom and the approximation is usually good, even for small sample 
sizes. 

3. The likelihood ratio test computes and rejects the assumption if x^ is 
larger than a Chi-Square percentile with k degrees of freedom, where the 
percentile corresponds to the confidence level chosen by the analyst. 

Kruskal-Wallis Test 

The Kruskal-Wallis test is a nonparametric version of one-way analysis of vari- 
ance. The assumption behind this test is that the measurements come from a 
continuous distribution, but not necessarily a normal distribution. The test is 
based on an analysis of variance using the ranks of the data values, not the data 
values themselves. The function returns the p-value for the null hypothesis that 
all samples are drawn from the same population (or from different populations 
with the same mean). It is based on the Chi-Square distribution. 

If the p-value is near zero, this casts doubt on the null hypothesis and suggests 
that at least one sample mean is significantly different than the other sample 
means. 

Analysis of Variance 

The purpose of one-way analysis of variance ("ANOVA") is to find out whether 
data from several groups have a common mean. That is, to determine whether 
the groups are actually different in the measured characteristic. 

One-way ANOVA is a simple special case of the linear model. The one-way 
ANOVA form of the model is 
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where yij is a matrix of observations in which each column represents a different group 
a,j is a matrix whose columns are the group means. (The "dot j" notation means 
that apphes to all rows of the jth column. That is, the value ij is the same for all i.) 

eij is a matrix of random disturbances 
The model posits that the columns of y are a constant plus a random distur- 
bance. You want to know if the constants are all the same. The p- values returned 
by the ANOVA test depend on assumptions about the random disturbances eij in 
the model equation. For the p-values to be correct these disturbances need to be 
independent, normally distributed, and have constant variance. Some nonpara- 
metric methods like the Kruskal-Wallis Test do not require a normal distribution. 



